Hierarchical relational data analysis using Python

In this notebook we are going to learn how to analyse Hierarchical Data.

Agenda:

1. Loading our data
2. Data Preprocessing
3. Data Exploration
4. What is Data Granularity?
5. Answering to some analytical questions 
6. Treemap and Sunburst diagrams

Importing modules

1. Loading the dataset

This data represents information about the product sales in different stores in Moxico

2. Some preprocessing

converting Date column to datetime dtype

let's check the data again

3. Data Exploration

How many sale records do we have?

Let's draw the histogram of Store_location column

Insights:

There are four different store locations in our data.
Seems like the stores located in Downtown have the highest number of sale records and the stores located in Airport have the lowest.

Let's draw the histogram of Product_Category column

What are the insights?

Draw the Product_cost Historgram

Draw the Product_Price Historgram

Let's calculate a column called "Profit" to calculate pure Income of each sale.

4. Understanding of Data Granularity

Let's say we want to analyze the data in Santiago city

Let's calculate the daily sale profits in Santiago

let's visualize this data using a line chart

Data Granularity is the level of information we can see in our data.

What if we have the following analytical question:
Compare Monthly profit in 2017 and 2018 and find the the Month with the highest Monthly profit

Let's create a column called MONTH and extract the month value from the Date column

let's check our data again

Let's use a bar cahrt to visualize the results

Let's Answer to some Analytical question.

5. Answering to some analytical questions

Question #1 - find the Products with highest Profit.

- Aggregating our data by Product_Name and sum up the profits. 

Let's use a bar chart to visualize the data

Question #2 - find the Product_Categorys with highest Profit.

  - We have to Aggregating our data by Product_Category and sum up the profits
  - It means we are going to summerize our data to Product Category Level.

Let's use a bar chart to visualize the data

Question #3 - Find the Products with highest profit with different Categories.

- We have to Aggregating our data by Product_Name and Product_Category and sum up the profits.

Question #4 - Which product with electronics category had the highest profit ?

Question #5 - In which locations, toys are the best sellers?

6. Treemap and Sunburst diagrams

Sunburst Diagram

Let's visualize the same data using sunburst diagram

Treemap Plot

Let's visualize the same data using Treemap diagram

Let's Visualize the original data using three map